A Heuristic Method for Document Ranking
نویسندگان
چکیده
In this paper, we address the efficiency of implementing the tf x idf ranking strategy with inverted files. Two search methods are studied. The first one sorts postings lists of query terms based upon the list length. It is the traditional sorting method used in the upperbound search algorithm. The second one sorts postings lists based upon the maximum tf as well as the list length. We show that the second method is able to identify a large portion of top documents without using a large amount of disk page accesses and it outperforms the first method by a large margin. We also provide an estimation method that can approximate the number of top documents obtained at different points of the retrieval process. Thus, the system allows the user to specify the condition for termination. The performance of these methods is demonstrated by experimental runs on four test collections made available with the SMART system.
منابع مشابه
RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملSemantic Search: Document Ranking and Clustering Using Computer Science Ontology and N-Grams
Semantic similarity has become an important tool and widely been used to solve traditional Information Retrieval problems. This study adopts ontology of computer science and proposes an ontology indexing weight based on Wu and Palmer’s edge counting measure and uses the N-grams method for computing a family of word similarity. The study also compares the subsumption weight between Hliaoutakis a...
متن کاملA method for integrating and ranking the evidence for biochemical pathways by mining reactions from text
MOTIVATION To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing sy...
متن کاملInvestigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval
Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model. Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...
متن کاملAn Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملWeb pages ranking algorithm based on reinforcement learning and user feedback
The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...
متن کامل